NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery

https://doi.org/10.18653/v1/2020.emnlp-main.666

Shen, Jiaming; Qiu, Wenda; Shang, Jingbo; Vanni, Michelle; Ren, Xiang; Han, Jiawei (November 2020, EMNLP'20: 2020 Conf. on Empirical Methods in Natural Language Processing, Nov. 2020)
null (Ed.)
Entity set expansion and synonym discovery are two critical NLP tasks. Previous studies accomplish them separately, without exploring their interdependences. In this work, we hypothesize that these two tasks are tightly coupled because two synonymous entities tend to have similar likelihoods of belonging to various semantic classes. This motivates us to design SynSetExpan, a novel framework that enables two tasks to mutually enhance each other. SynSetExpan uses a synonym discovery model to include popular entities’ infrequent synonyms into the set, which boosts the set expansion recall. Meanwhile, the set expansion model, being able to determine whether an entity belongs to a semantic class, can generate pseudo training data to fine-tune the synonym discovery model towards better accuracy. To facilitate the research on studying the interplays of these two tasks, we create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks demonstrate the effectiveness of SynSetExpan for both entity set expansion and synonym discovery tasks.
more » « less
Full Text Available
Mining Entity Synonyms with Efficient Neural Set Generation

https://doi.org/10.1609/aaai.v33i01.3301249

Shen, Jiaming; Lyu, Ruiliang; Ren, Xiang; Vanni, Michelle; Sadler, Brian; Han, Jiawei (July 2019, Proceedings of the AAAI Conference on Artificial Intelligence)

Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holistic semantics of a set and suffer from the error propagation issue. Here we propose a new framework, named SynSetMine, that efficiently generates entity synonym sets from a given vocabulary, using example sets from external knowledge bases as distant supervision. SynSetMine consists of two novel modules: (1) a set-instance classifier that jointly learns how to represent a permutation invariant synonym set and whether to include a new instance (i.e., a term) into the set, and (2) a set generation algorithm that enumerates the vocabulary only once and applies the learned set-instance classifier to detect all entity synonym sets in it. Experiments on three real datasets from different domains demonstrate both effectiveness and efficiency of SynSetMine for mining entity synonym sets.
more » « less
Full Text Available
Mining Entity Synonyms with Efficient Neural Set Generation

Shen, Jiaming; Lyu, Ruiliang; Ren, Xiang; Vanni, Michelle; Sadler, Brian; Han, Jiawei (January 2019, Proceedings of the AAAI Conference on Artificial Intelligence)

Full Text Available
TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

https://doi.org/10.1145/3219819.3220064

Zhang, Chao; Tao, Fangbo; Chen, Xiusi; Shen, Jiaming; Jiang, Meng; Sadler, Brian; Vanni, Michelle; Han, Jiawei (August 2018, Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018)

Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods.
more » « less
Full Text Available
HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion

https://doi.org/10.1145/3219819.3220115

Shen, Jiaming; Wu, Zeqiu; Lei, Dongming; Zhang, Chao; Ren, Xiang; Vanni, Michelle T.; Sadler, Brian M.; Han, Jiawei (August 2018, Proceedings of the 24th {ACM} {SIGKDD} International Conference on Knowledge Discovery {\&} Data Mining, {KDD} 2018)

Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the “is-a” relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus, and allow users to input a “seed” taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies.
more » « less
Full Text Available

Search for: All records